QMIR 2026

Week 2: Reproducible Research with R, Quarto & Git

Tristan Muno
February 13, 2026

Course Logistics & Philosophy

  • Practical
    • Send me your GitHub username
    • If you email me, please include “QMIR” in the subject line
  • Philosophy
    • I will show you a powerful workflow – not a rigid religion
    • Tools should serve you, not the other way around – be pragmatic, not dogmatic!
    • 80/20 principle – don’t let perfect be the enemy of good

Week 2 Goals

By the end of today, you should:

  1. Understand how Quarto + R + Git + GitHub fit together
  2. Be able to create a project with version control
  3. Have pushed your first reproducible document to GitHub

Agenda

  1. The Big Picture – What is Reproducible Research?
  2. What is Literate Programming?
  3. Version Control – The Concept
  4. GitHub as Infrastructure
  5. Walkthrough & Exercises

The Bid Picture – What is Reproducible Research?

What is Reproducible Research?

What is Reproducible Research?

  • Reproducible research means:
    • Every results can be regenerated from raw data
    • Every transformation and coding decision is documented
    • Every table and figure can be recreated automatically
    • Analytical choices are transparent and traceable
  • Why this matters:
    • Reviewers can understand and verify your analysis
    • Coauthors can follow and extend your work
    • You can easily test alternative specifications (e.g., add variable X)
    • Your future self knows exactly what you did
    • If your laptop dies tomorrow, nothing is lost

What is Reproducible Research?

  • Quarto \(\rightarrow\) integrates writing and code ✍️💻
  • R \(\rightarrow\) performs computation and analysis 📊🧮
  • Git \(\rightarrow\) tracks changes over time 🔄📝
  • GitHub \(\rightarrow\) enables remote storage and collaboration ☁️🤝

What is Reproducible Research?

Used properly, this workflow promotes

  • Reproducibility
  • Transparency
  • Traceability
  • Automation

What is Reproducible Research?

  • For small or one-off tasks, this workflow may seem excessive
  • Its advantages become substantial as projects increase in scope, duration, or collaboration

The Workflow

A .qmd (source document) B rendered output (e.g. html, pdf, docx) A->B C Local Git repository (snapshot) B->C D GitHub (remote repository) C->D
Figure 1: The basic workflow
Figure 2: Illustration of Quarto. Source: Posit Software, PBC (2025)

What is Literate Programming?

Literate Programming

Literate Programming

Core idea (Knuth 1984):

Literate Programming

Core idea (Knuth 1984):

The document is the analysis

Literate Programming

Our tool for literate programming: Quarto markdown files (.qmd)

Literate Programming

Our tool for literate programming: Quarto markdown files (.qmd)

  • Markdown
  • Code cells
  • Output embedded in document
  • No copy-paste between different programs

Markdown

# Section header

## A subsection

*This sentence is written in italics.*
**This one is bold.**
Using backticks formats text to `look like code`.

[This is a hyperlink](www.uni-mannheim.de)

A Section

A subsection

This sentence is written in italics. This one is bold. Using backticks formats text to look like code.

This is a hyperlink

Code Cells

## A subsection

*This sentence is written in italics.*
**This one is bold.**
Using backticks formats text to `look like code`.

```r
#| echo: true
#| eval: true
age <- c(18, 23, 21, 57, 24, 19)
mean_age <- mean(age)
```

The mean age is ``{r} mean_age``.

A subsection

This sentence is written in italics. This one is bold. Using backticks formats text to look like code.

View source
age <- c(18, 23, 21, 57, 24, 19)
mean_age <- mean(age)

The mean age is 27.

Output embedded in documents

## Example of embedded figure

```r
#| echo: true
#| eval: true
#| label: fig-demo
#| fig-cap: "A demo figure"
library(forcats)
library(ggplot2)

demo_data <- data.frame(
  country = c("Germany", "France", "Italy", "Spain", "Poland"),
  pop = c(83, 67, 60, 47, 38)
)

ggplot(
  data = demo_data,
  aes(x = fct_reorder(country, pop), y = pop)
) +
  geom_col(fill = "white", color = "black") +
  theme_bw() +
  labs(x = "Country", y = "Population in Million")
```

@fig-demo shows an embedded figure.

Example of embedded figure

View source
library(forcats)
library(ggplot2)

demo_data <- data.frame(
  country = c("Germany", "France", "Italy", "Spain", "Poland"),
  pop = c(83, 67, 60, 47, 38)
)

ggplot(
  data = demo_data,
  aes(x = fct_reorder(country, pop), y = pop)
) +
  geom_col(fill = "white", color = "black") +
  theme_bw() +
  labs(x = "Country", y = "Population in Million")
Figure 3: A demo figure

Figure 3 shows an embedded figure.

Version Control with Git

Version Control with Git

  • Repository (repo) \(\rightarrow\) project folder with memory 🗂️
  • git init \(\rightarrow\) Turns a normal folder into a Git repository 🚀
  • Stage (git add) \(\rightarrow\) Select changes you want to include in the next snapshot 📦
  • Commit \(\rightarrow\) A saved snapshot of staged changes 📸
  • Commit message \(\rightarrow\) Short explanation of what and why you changed something 💬
  • History (git log) \(\rightarrow\) Timeline of all committed snapshots 🕰️
  • Remote \(\rightarrow\) Online copy of your repository (e.g., GitHub) ☁️

Version Control with Git

Version Control with Git

What we want to avoid:

analysis_final.R
analysis_final2.R
analysis_REALLY_final.R

draft1.pdf
draft2.pdf
draft2_final.pdf

This is manual version control – and it breaks down quickly.

Remember: Git tracks changes inside files – not by duplicating them

GitHub as Infrastructure

GitHub as Infrastructure

  • Remote backup of your repositories ☁️
  • Collaboration via pull requests & issues 👥
  • Transparency & reproducibility 🔎
  • Platform for public research visibility 📢
  • Hub for open-source software development 🧩

GitHub as Infrastructure

  • Remote backup of your repositories ☁️
  • Collaboration via pull requests & issues 👥
  • Transparency & reproducibility 🔎
  • Platform for public research visibility 📢
  • Hub for open-source software development 🧩

Today, sharing code on GitHub is often part of the research output itself.

A Critical Note on Platform Choice

  • GitHub is a US-based, privately owned company (Microsoft)
  • There are alternatives (e.g. GitLab, Codeberg)
  • Platform choice is therefore also a question of:
    • Data governance
    • Digital sovereignty
    • Infrastructure dependence

A Critical Note on Platform Choice

  • GitHub is a US-based, privately owned company (Microsoft)
  • There are alternatives (e.g. GitLab, Codeberg)
  • Platform choice is therefore also a question of:
    • Data governance
    • Digital sovereignty
    • Infrastructure dependence

🔐 Reminder: Uploading data to US-hosted platforms can have legal implications. Never upload personal or restricted data without checking GDPR and institutional regulations.

So why use GitHub?

  • It is industry standard 🌐
  • Most open-source projects are hosted there 🧑‍💻
  • Most active developers and scientists use it 🧪
  • Learning GitHub increases career portability 📚

So why use GitHub?

  • It is industry standard 🌐
  • Most open-source projects are hosted there 🧑‍💻
  • Most active developers and scientists use it 🧪
  • Learning GitHub increases career portability 📚

For these pragmatic reasons, we will use GitHub in this course.

GitHub

Time to Practice!

Test if Git is installed

git status

Live Demo 1: Create new empty project

Live Demo 2: Clone repo from GitHub

Thank you for your attention and see you next week!

Please make sure to install all software and send me your GitHub name or email address.

Knuth, Donald Ervin. 1984. “Literate Programming.” The Computer Journal 27 (2): 97–111. https://doi.org/10.1093/comjnl/27.2.97.
Posit Software, PBC. 2025. Publish and Share with Quarto: Quarto Cheatsheet. Posit. https://rstudio.github.io/cheatsheets/html/quarto.html.